A Phrase-Based Method for Hierarchical Clustering of Web Snippets

نویسندگان

  • Zhao Li
  • Xindong Wu
چکیده

Document clustering has been applied in web information retrieval, which facilitates users’ quick browsing by organizing retrieved results into different groups. Meanwhile, a tree-like hierarchical structure is wellsuited for organizing the retrieved results in favor of web users. In this regard, we introduce a new method for hierarchical clustering of web snippets by exploiting a phrase-based document index. In our method, a hierarchy of web snippets is built based on phrases instead of all snippets, and the snippets are then assigned to the corresponding clusters consisting of phrases. We show that, as opposed to the traditional hierarchical clustering, our method not only presents meaningful cluster labels but also improves clustering performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics

This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...

متن کامل

Mining Key Phrase Translations from Web Corpora

Key phrases are usually among the most information-bearing linguistic structures. Translating them correctly will improve many natural language processing applications. We propose a new framework to mine key phrase translations from web corpora. We submit a source phrase to a search engine as a query, then expand queries by adding the translations of topic-relevant hint words from the returned ...

متن کامل

Clustering the Web Comparing Clustering

Clustering – automatically sorting – web search results has been the focus of much attention but is by no means a solved problem, and there is little previous work in Swedish. This thesis studies the performance of three clustering algorithms – k-means, agglomerative hierarchical clustering, and bisecting k-means – on a total of 32 corpora, as well as whether clustering web search previews, cal...

متن کامل

unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering

This paper describes our system for Task 11 of SemEval-2013. In the task, participants are provided with a set of ambiguous search queries and the snippets returned by a search engine, and are asked to associate senses with the snippets. The snippets are then clustered using the sense assignments and systems are evaluated based on the quality of the snippet clusters. Our system adopts a preexis...

متن کامل

Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution

This paper describes Armil, a meta-search engine that groups the Web snippets returned by auxiliary search engines into disjoint labelled clusters. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to his/her information need. Striking the right balance between running time and cluster well-formedness was a key point in the de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010